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ABSTRACT 

Orthogonal  frequency  division  multiplexing  (OFDM)  systems  divide  the  entire  channel  into  many  narrow  parallel 
sub-channels,  increasing  the  symbol  duration  and  reducing  the  inter-symbol  interference  (ISI)  caused  by  the  multipath. 
Multiple-input  multiple-output  (MIMO)  systems  make  use  of  multiple  antennas  at  the  transmitter  and  receiver  can  exhibit  a 
substantially  higher  spectral  efficiency  and  improve  the  system  capacity  significantly.  Therefore,  the  combination  of 
MIMO  and  OFDM,  which  is  called  MIMO-OFDM,  has  emerged  as  a  major  candidate  for  the  fourth-generation 
communications.  First,  in  this  paper,  we  introduce  an  enhanced  vertical  Bell  Labs  layered  space-time  (V-BLAST)  receiver 
which  takes  the  decision  errors  into  account.  Second,  we  propose  a  novel  iterative  detection  and  decoding  (IDD)  scheme 
for  coded  layered  space-time  architectures  in  MIMO-OFDM  systems.  For  the  iterative  process,  a  low  complexity 
demapper  is  developed  by  making  use  of  both  non-linear  interference  cancellation  and  linear  minimum  mean-square  error 
filtering  and  a  low  complexity  algorithm  for  LLR  calculation  also  developed.  Simulation  results  demonstrate  that  the 
proposed  method  achieves  the  optimal  turbo-MIMO  approach,  while  providing  considerable  reduction  in  latency  and  also 
considerable  reduction  in  computational  complexity 

KEYWORDS:  Iterative  Detection  and  Decoding  (IDD),  Multiple-Input-Multiple-Output  (MIMO),  Orthogonal 
Frequency-Division  Multiplexing  (OFDM),  Vertical  Bell  Labs  Layered  Space-Time  (V-BLAST) 

INTRODUCTION 

The  current  demand  for  broadband  multimedia  services,  ubiquitous  networking,  and  explosive  Internet  access 
using  portable  devices  such  as  PDAs,  cellular  terminals,  laptops,  etc.,  all  are  growing  at  such  an  enormous  pace  that  has 
pushed  the  development  of  modem  and  system  architecture  for  high-speed  data.  Multiple-input  multiple-output  (MIMO) 
systems  make  use  of  multiple  antennas  at  the  transmitter  and  receiver.  A  MIMO  system  takes  advantage  of  the  spatial 
diversity  that  is  obtained  by  spatially  separated  antennas  in  a  dense  multipath  scattering  environment.  The  layered 
space-time  architecture  suggested  in  has  promised  extremely  high  spectral  efficient  multiple-  layered  space-time 
(V-BLAST)  exhibits  the  best  tradeoff  between  performance  and  complexity  [1].  The  V-BLAST  uses  a  combination  of 
linear  and  nonlinear  detection  techniques. 
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Figure  1:  General  Block  Diagram  of  Communication  System 
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First,  we  introduce  an  enhanced  V-BLAST  detection  algorithm  which  takes  the  error  propagation  effect  into 
account  [2].  By  including  the  decision  errors  into  the  filtering  formulation,  an  improved  detection  performance  is  achieved. 
Second,  employing  the  enhanced  V-BLAST  as  a  front-end  receiver  for  MIMO-OFDM  systems,  we  propose  an  iterative 
detection  and  decoding  (IDD)  approach  which  further  improves  the  detection  performance  by  utilizing  decoder  output. 
The  high  computational  complexity  of  IDD,  however,  poses  significant  challenges  for  practical  implementations 
(in  terms  of  circuit  area,  latency,  throughput  and  power  consumption).  So,  we  include  a  novel  iterative  receiver  schedule, 
which  simultaneously  performs  detection  and  decoding  on  the  same  code  block.  This  novel  IDD  approach  is  referred  to  as 
layered  detection  and  decoding  (LDD)  and  achieves  lower  latency  and  better  performance  compared  to  conventional 
solutions.  Figure  1  shows  the  location  of  the  proposed  scheme  in  the  complete  communication  system. 

ENHANCED  V-BLAST  WITH  ERROR  COMPENSATION 

We  investigate  the  coded  layered  space-time  architectures  for  frequency-selective  fading  multiple-input 
multiple-output  orthogonal  frequency-division  multiplexing  (OFDM)  channels.  The  V-BLAST  uses  a  combination  of 
linear  and  nonlinear  detection  techniques:  first  nullify  the  interference  from  yet  undetected  signals,  and  then  canceling  out 
the  interference  using  already  detected  signals  as  shown  in  Figure  2.  By  computing  outage  capacity  formulas,  we  will 
indicate  that  the  capacity  of  the  vertical  Bell  Labs  layered  space-time  (V-BLAST)  architecture  become  closer  to  the 
Shannon  capacity  in  the  frequency-selective  OFDM  environment.  First,  we  start  with  a  comprehensive  signal  modeling 
which  takes  error  propagation  into  account.  We  derive  an  improved  signal  detector  and  describe  the  optimal  soft-bit 
log-likelihood  ratio  value-computation  method  by  including  the  decision  errors  for  soft-input  channel  decoding. 
Finally,  simulations  prove  that  the  proposed  schemes  indicate  significant  performance  improvement  over  the  conventional 
methods. 
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Figure  2:  V-BLAST  Architecture 

V-BLAST  Signal  Processing  Algorithms 

The  V-BLAST  signal  processing  algorithms  are  used  at  the  receiver.  The  V-BLAST  signal  processing  algorithms 
are  the  heart  of  the  technique.  At  the  receiving  antennas,  high-speed  signal  processors  look  at  the  signals  from  all  the 
receiver  antennas  simultaneously,  first  extracting  the  strongest  sub-stream  from  the  morass,  then  proceeding  with  the 
remaining  weaker  signals,  which  are  easier  to  recover  once  the  stronger  signals  have  been  removed  as  a  source  of 
interference.  Again,  the  ability  to  separate  the  sub-streams  depends  on  the  inconsiderable  differences  in  the  way  the 
different  sub-streams  propagate  through  the  environment.  Under  the  widely  used  theoretical  assumption  of  independent 
Rayleigh  scattering,  the  theoretical  capacity  of  the  V-BLAST  architecture  grows  roughly  linearly  with  the  number  of 
transmitter  antennas,  even  when  the  total  transmitted  power  is  kept  constant.  In  the  real  world,  scattering  will  be  less 
favourable  than  the  independent  Rayleigh  assumption,  and  it  remains  to  be  seen  how  much  capacity  is  actually  available  in 
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different  propagation  environments.  Nevertheless,  even  in  relatively  poor  scattering  environments,  V-BLAST  provides 
significantly  higher  capacities  than  conventional  architectures.  It  has  already  demonstrated  spectral  efficiencies  of 
20  -  40  bits  per  second  per  Hertz  of  bandwidth,  numbers  which  are  simply  unattainable  using  standard  techniques. 

ITERATIVE  DETECTION  AND  DECODING 

We  now  describe  an  IDD  scheme  combined  with  V-BLAST  for  MIMO-OFDM  systems.  In  this  section, 
we  exploit  the  channel  coding  gain  to  further  improve  the  performance.  Comparing  with  the  turbo-MIMO  receiver,  one  big 
difference  in  the  proposed  IDD  block  is  that  MIMO  demapper  block  is  replaced  by  single-input-single-output  (SISO) 
demapper.  Thus,  a  complex  BCJR  decoder  can  be  replaced  by  much  simpler  Viterbi  decoder  to  reduce  the  computational 
complexity  further  which  is  shown  in  Figure  4.  The  key  mechanism  of  the  IDD  process  is  the  information  exchange 
between  MIMO  detector  and  channel  decoder,  leading  to  successive  performance  improvement.  They  exchange  soft 
information,  which  has  a  form  of  log  likelihood  ratio  (LLR)  of  a  certain  bit  [3].  First,  the  MIMO  detector  processes  the 
received  signal  and  the  soft  information  delivered  from  the  channel  decoder  to  obtain  the  LLRs  of  all  coded  bits 
(called  extrinsic  information).  Such  extrinsic  information  is  delivered  to  the  SISO  channel  decoder  through  deinterleaver. 
The  extrinsic  information  is  seen  as  a  priori  information  at  the  side  of  the  hannel  decoder.  Based  on  such  a  priori 
information,  the  channel  decoder  computes  the  LLRs  of  coded  bits,  which  form  extrinsic  information,  which  can  be  used 
for  better  MIMO  detection.  Such  extrinsic  LLRs  are  interleaved  and  fed  back  to  the  MIMO  detector  as  a  priori  information. 
The  procedure  mentioned  so  far  completes  one  cycle  of  iteration  and  the  iterations  continues  until  it  reaches  a  desired  level. 
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Figure  3:  Receiver  Structure  of  Proposed  IDD  Scheme 
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Figure  4:  IDD  Architecture 
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NOVEL  ITERATIVE  APPROACH 

The  high  computational  complexity  of  IDD,  however,  poses  significant  challenges  for  practical  implementations 
(in  terms  of  circuit  area,  latency,  through  put  and  power  consumption)  [5].  In  this  paper,  we  propose  a  novel  iterative 
receiver  schedule,  which  simultaneously  performs  detection  and  decoding  on  the  same  code  block.  This  novel  IDD 
approach  is  referred  to  as  layered  detection  and  decoding  (LDD)  and  achieves  lower  latency  and  better  performance 
compared  to  conventional  solutions. 

We  show  that  LDD  is  able  to  substantially  simplify  the  task  of  matching  the  throughput  of  the  detector  and 
decoder  unit,  while  being  able  to  achieve  lower  latency  and  better  performance  than  conventional  IDD  schemes. 
Non-iterative  receivers  (I  =  1)  resemble  a  coarse  grained  pipelined  architecture  consisting  of  two  stages,  where  the  first 
stage  corresponds  to  a  soft-output  detector  and  the  second  stage  to  the  channel  decoder.  In  such  architecture,  the  overall 
throughput  is  limited  by  the  maximum  run-time  of  either  the  detector  or  the  decoder  unit,  i.e.,  we  have 

T  = 

1  nnn 


where  C  denotes  the  code-word  size,  Sector  stands  for  the  time  required  by  the  detector  to  compute  the  LLR 
values  (1),  and  tdecoder  is  the  time  required  by  the  channel  decoder  to  compute  a  set  of  new  a-priori  LLRs 
(and  the  estimates  for  the  transmitted  bits).  In  the  following,  we  refer  to  both  quantities  tdetectOT  and  tdecodel  as  the  runtimes  of 
the  two  units. 

Serial  Architecture 

Serial  architecture  is  an  straightforward  design  for  an  IDD  receiver.  A  shared  memory 
(used  for  storing  the  LLR-values)  is  connected  to  both,  the  detector  and  the  decoder  unit  as  shown  in  Figure  5. 
One  code-word  block  is  processed  in  an  alternating  fashion  in  both  units.  Specifically,  the  throughput  of  this  architecture 
corresponds  to 

T  — 

1  cay 


The  latency  associated  with  the  serial  architecture  behaves  similarly  and  increases  linearly  in  the  number  of 
iterations  as 

^ser  ~  \f  ~ '  ^tdetector  ^-decoder 

In  addition  to  the  rather  poor  throughput  and  latency  behaviour  of  the  serial  architecture,  it  is  important  to  realize 
that  one  of  the  two  units  in  this  architecture  is  always  idle.  Hence,  the  serial  architecture  is  highly  sub-optimal  from  a 
resource  utilization  point-of-view. 

Ping-Pong  Architecture 

An  architecture  that  uses  pipeline  interleaving,  to  process  two  different  set  of  code  words  within  the  two  pipeline 
stages.  Specifically,  while  the  LLR  values  associated  with  codeword  'A'  are  processed  in  the  detector,  the  codeword  'B'  is 
processed  concurrently  in  the  decoder  unit.  The  interleaving  of  two  codeword  blocks  allows  to  utilize  both  units 
simultaneously,  which  increase  the  throughput  (compared  to  the  serial  schedule)  to 
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T  =  

pp       Imax  {tdetectortdecoder} 

However,  to  achieve  full  hardware  utilization,  the  runtime  of  the  detector  and  decoder  unit  tdetectoi  and  tdecodei  must 
be  matched.  A  mismatch  between  both  runtimes  forces  one  unit  into  an  idle  phase,  which  degrades  the  throughput. 

Layered  Detection  and  Decoding 

The  key  idea  of  layered  detection  and  decoding  (LDD)  is  to  get  rid  of  the  sequential  dependency  between 
detection  and  decoding  altogether.  LDD  is  not  merely  another  architecture  option  for  conventional  IDD  schedules,  but  a 
new  schedule  of  its  own  With  LDD,  the  SISO  detector  and  channel  decoder  process  the  same  block  of  LLR  values 
simultaneously(see  Figure  6).  Since  both  units  can  now  operate  independently  and  in  parallel  without  requiring  to  be 
synchronized,  the  utilization  of  the  detector  and  decoder  units  can  be  maximized  without  the  need  of  matching  the 
respective  runtimes.  Since  LDD  avoids  the  notion  of  iterations,  one  can  get  rid  of  the  strict  dichotomy  between  the 
SISO  detector  and  the  channel  decoder  that  cases  the  rather  long  latency  associated  with  IDD. 
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SIMULATION  RESULTS 

Simulation  Results  for  Enhanced  V-BLAST 

First,  we  compare  the  performance  of  the  enhanced  V-BLAST  with  the  conventional  V-BLAST.  Here,  we 
consider  flat  fading.  The  packet  size  is  taken  as  100.  block  length  is  set  to  200.  No  iterative  decoding  is  assumed  for  the 
evaluation.  A  binary  convolutional  code  with  polynomials  (133,171)  in  octal  notation  of  rate  1/2  is  used  for  the 
simulations. 
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For  4bps,  we  can  see  that  the  enhanced  V-BLAST  provides  about  6  dB  gain  at  1%  FER  over  the  conventional 
V-BLAST  as  shown  in  Figure  7.  The  improvements  are  achieved  by  considering  the  decision  errors  in  the  equalization 
process  and  the  soft  bit  metric  generation.  As  observed  in  Figure  6,  the  gain  of  the  enhanced  V-BLAST  over  the 
conventional  V-BLAST  increases  to  8  dB  at  1%  FER  for  the  case  of  8bps.  These  results  confirm  that  the  decision  error 
compensation  is  crucial  for  the  coded  layered  space-time  architectures. 

Simulation  Results  for  IDD  (Comparison  of  IDD  with  Various  Schemes) 

In  order  to  demonstrate  the  performance  of  the  proposed  scheme,  we  compare  the  following  systems  as  shown  in 

Figure  9. 

•  The  IDD  with  ML  Detector:  Applying  the  conventional  V-BLAST  with  ML  detector  in  the  IDD  block. 

•  The  IDD  with  ZF  Detector:  Applying  the  conventional  V-BLAST  with  ZF  detector  in  the  IDD  block. 

•  The  IDD  with  V-BLAST  Detector:  Applying  the  conventional  V-BLAST  with  VBLAST  detector  in  the  IDD 
block. 

•  The  Proposed  IDD  with  ZF  Detector:  Applying  the  conventional  V-BLAST  with  ZF  detector  in  the  IDD  block. 

•  The  Proposed  IDD:  Applying  the  enhanced  V-BLAST  with  Viterbi  algorithm  in  the  IDD  block. 
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Figure  9:  BER  vs  Eb/NO 
Simulation  Results  for  LDD 


Figure  10:  LDD  Latency  Performance 


The  proposed  LDD  schedule  is  particularly  suited  for  iterative  receivers  based  on  LDPC  decoders.  In  that  case, 
the  fact  that  no  interleaving  is  required  by  the  employed  LDPC  decoder  simplifies  the  concurrent  memory  access  of  the 
detector  and  the  decoder  and  enables  its  efficient  implementation  which  is  shown  in  Figure  10. 

CONCLUSIONS 

In  this  paper,  we  have  proposed  pragmatic  schemes  for  the  layered  space-time  architectures  in  MIMO-OFDM 
systems.  Employing  the  enhanced  V-BLAST  as  a  front-end  demodulator,  the  proposed  IDD  scheme  enables  us  to  achieve 
further  performance  gain.  LDD  significantly  reduces  the  processing  latency  compared  to  existing  IDD  architectures. 
The  proposed  LDD  scheme  is  particularly  well  suited  for  wireless  standards  which  mandate  stringent  latency  constraints. 
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Simulation  results  shows  that  the  performance  of  the  proposed  iterative  scheme  is  just  less  than  1  dB  away  from  the 
near-optimum  turbo-MIMO  for  all  the  simulation  configurations  with  remarkably  reduced  complexity.  The  simulation 
results  confirm  that  by  properly  treating  the  decision  errors  in  interference  cancellation,  the  detrimental  effects  of  error 
propagation  can  be  almost  completely  overcome  by  the  proposed  iterative  processing. 
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